Skip to content

boot: bootutil: Fix max image size computation for swap-move and swap-offset #2283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

taltenbach
Copy link
Contributor

@taltenbach taltenbach commented Apr 27, 2025

When computing the maximum image size in bootutil_max_image_size for the swap-move or swap-offset strategy, the computation was using the size of the flash area provided as argument and was not taking into account the size of the padding sector. This was causing an incorrect (too large) size to be returned in the following cases:

  • Both primary and secondary slots have the same size.
  • The primary and secondary slots don't have the same size, but the routine is called for the slot containing the padding sector.

For example, let's imagine swap-move is being used on a device having a sector size S and two slots of N bytes. This is valid configuration and the maximum image size is N - S - T, T being the size of the trailer rounded up to the next multiple of S. When calling bootutil_max_image_size with either the primary or secondary slot, the size N - T is returned, which is incorrect.

This was not detected by the tests since, due to limitations of the simulator, the swap-move and swap-offset strategies were not tested with the largest possible images. In addition to fixing the issue in bootutil_max_image_size, this PR modifies the simulator to allow the simulator to use largest possible images for all strategies and therefore detect those kind of issues.

taltenbach added 4 commits May 1, 2025 17:07
When computing the maximum image size in bootutil_max_image_size for
swap-move or swap-offset strategy, the computation was using the size of
the flash area provided as argument and was not taking into account the
size of the padding sector. This was causing an incorrect size to be
returned in some cases, for example when the two slots have the same
size or when the slots haven't the same size but the routine is called
for the slot containing the padding sector.

For example, let's imagine swap-move is being used on a device having a
sector size S and two slots of N bytes. This is valid configuration and
the maximum image size is N - S - T, T being the size of the trailer
rounded up to the next multiple of S. When calling
bootutil_max_image_size with either the primary or secondary slot, the
size N - T is returned, which is incorrect.

This commit fixes the issue by computing always the maximum image using
the size of the slot containing the padding and substracting the size of
the padding and of the aligned trailer.

Signed-off-by: Thomas Altenbach <[email protected]>
When checking the size of an image in bootutil_img_validate, the offset
to the end of the TLV area was used as the image size in all cases.
However, when using swap-offset, the upgrade image is written in the
secondary with an offset. This offset is not part of the image and must
therefore not be taken into account in the image size.

Signed-off-by: Thomas Altenbach <[email protected]>
For the swap-move and swap-offset strategies, the computation of the
largest image size was not taking taking into account the padding that
is needed when using those strategies. Due to this limitation, the
simulator is currently using hardcoded image sizes, smaller than the
maximum possible size, when running tests for the swap-move or
swap-offset strategies. This commit fixes the maximum image size
computation for those strategies.

Signed-off-by: Thomas Altenbach <[email protected]>
The simulator was testing the upgrade with the largest image possible
for all strategies, except for overwrite-only, swap-move and swap-offset
because some tests were failing when the maximum image size was used.

For overwrite-only, this was due to an incorrect trailer size
computation. This has been fixed by
88294be.

For swap-move and swap-offset, this was due to the simulator not taking
into account the padding needed by those strategies in the primary or
secondary slot, but also to incorrect computation of the maximum image
size in some cases by the MCUboot library. Both issues have been fixed
by the previous commits.

Since all those issues have been fixed, the simulator can now be
configured to test upgrade with the largest possible image for all
strategies.

Note that logic needed to generate image of a given image is kept even
if not useful anymore at the moment, since that might be needed when
test will be added to ensure proper behavior when images of different
sizes are used.

Signed-off-by: Thomas Altenbach <[email protected]>
@taltenbach taltenbach force-pushed the fix/swap-move-swap-offset-max-img-size branch from a494479 to 52ba928 Compare May 1, 2025 15:08
Copy link

@tomchy tomchy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my tests, fixing this comparison will reveal another issue - that swap_scramble_trailer_sectors(..) is capable of erasing image in the secondary slot, as it assumes that the trailer is present in both of them.
As a result - the device may become unresponsive after the update is tested.

I am using a bit outdated source code ATM, will try to reproduce it on the main/w PR too.

EDIT: Reproduced on the main branch too.

Steps to reproduce:

  • Configure two "efficient" slots.
  • Use the application that is close to the size limit.
  • Flash it alongside the MCUboot.
  • Build a smaller update candidate.
  • Trigger the update.
  • Neither old nor the new application will boot.

Logs:

*** Booting MCUboot v2.1.0-dev-7bdc0e1ebfd8 ***
*** Using nRF Connect SDK v3.0.99-503b0c832dfa ***
*** Using Zephyr OS v4.1.99-31155f95d8a4 ***
I: Starting bootloader
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: test
I: Starting swap using move algorithm.
E: Image in the primary slot is not valid!
E: Unable to find bootable image
*** Booting MCUboot v2.1.0-dev-7bdc0e1ebfd8 ***
*** Using nRF Connect SDK v3.0.99-503b0c832dfa ***
*** Using Zephyr OS v4.1.99-31155f95d8a4 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=bad, swap_type=0x0, copy_done=0x2, image_ok=0x2
I: Boot source: none
I: Image index: 0, Swap type: revert
E: Image in the secondary slot is not valid!
E: Image in the primary slot is not valid!
E: Unable to find bootable image

@de-nordic
Copy link
Collaborator

According to my tests, fixing this comparison will reveal another issue - that swap_scramble_trailer_sectors(..) is capable of erasing image in the secondary slot, as it assumes that the trailer is present in both of them. As a result - the device may become unresponsive after the update is tested.

I am using a bit outdated source code ATM, will try to reproduce it on the main/w PR too.

The swap_scramble_trailer_sectors is capable of erasing anything, it does not know what it is erasing, it is not even trying to identify it. It literally assumes that has been asked to "erase trailer sectors" on whatever has been provided via flash area pointer.

@de-nordic
Copy link
Collaborator

DNM to investigate issue

@nordicjm nordicjm added the [DNM] Do Not Merge label May 7, 2025
@nordicjm
Copy link
Collaborator

nordicjm commented May 7, 2025

According to my tests, fixing this comparison will reveal another issue - that swap_scramble_trailer_sectors(..) is capable of erasing image in the secondary slot, as it assumes that the trailer is present in both of them. As a result - the device may become unresponsive after the update is tested.

I am using a bit outdated source code ATM, will try to reproduce it on the main/w PR too.

EDIT: Reproduced on the main branch too.

Steps to reproduce:

* Configure two "efficient" slots.

* Use the application that is close to the size limit.

* Flash it alongside the MCUboot.

* Build a smaller update candidate.

* Trigger the update.

* Neither old nor the new application will boot.

Logs:

*** Booting MCUboot v2.1.0-dev-7bdc0e1ebfd8 ***
*** Using nRF Connect SDK v3.0.99-503b0c832dfa ***
*** Using Zephyr OS v4.1.99-31155f95d8a4 ***
I: Starting bootloader
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: test
I: Starting swap using move algorithm.
E: Image in the primary slot is not valid!
E: Unable to find bootable image
*** Booting MCUboot v2.1.0-dev-7bdc0e1ebfd8 ***
*** Using nRF Connect SDK v3.0.99-503b0c832dfa ***
*** Using Zephyr OS v4.1.99-31155f95d8a4 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=bad, swap_type=0x0, copy_done=0x2, image_ok=0x2
I: Boot source: none
I: Image index: 0, Swap type: revert
E: Image in the secondary slot is not valid!
E: Image in the primary slot is not valid!
E: Unable to find bootable image

You are using NCS so really the results are invalid anyway, cannot replicate with nrf5340dk using upstream (rebased this branch on main).

Original application:

Memory region         Used Size  Region Size  %age Used
           FLASH:      446620 B     450224 B     99.20%
             RAM:       58408 B       448 KB     12.73%
        IDT_LIST:          0 GB        32 KB      0.00%

Update candidate:

Memory region         Used Size  Region Size  %age Used
           FLASH:      172068 B     450224 B     38.22%
             RAM:       58408 B       448 KB     12.73%
        IDT_LIST:          0 GB        32 KB      0.00%

Upgrade:

*** Booting MCUboot v2.1.0-rc1-314-gdfe6918f73cd ***
*** Using Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0x10000
I: Image version: v0.0.0
I: Jumping to the first image slot

uart:~$ *** Booting Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
uart:~$ [00:00:00.002,075] <inf> littlefs: LittleFS version 2.10, disk version 2.1
uart:~$ [00:00:00.002,258] <inf> littlefs: FS at flash-controller@39000:0xf8000 is 8 0x1000-byte blocks with 512 cycle
uart:~$ [00:00:00.002,258] <inf> littlefs: partition sizes: rd 16 ; pr 16 ; ca 64 ; la 32
uart:~$ uart:~$ [00:00:00.020,843] <inf> smp_sample: build time: May  7 2025 07:50:13
uart:~$ [00:00:00.024,902] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
uart:~$ [00:00:00.024,932] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
uart:~$ [00:00:00.024,963] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 75.62666 Build 867997738
uart:~$ [00:00:00.026,397] <inf> bt_hci_core: HCI transport: IPC
uart:~$ [00:00:00.026,489] <inf> bt_hci_core: Identity: F6:39:AA:7E:BA:99 (random)
uart:~$ [00:00:00.026,519] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x20fb, manufacturer 0x0059
uart:~$ [00:00:00.026,550] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x20fb
uart:~$ [00:00:00.028,259] <inf> smp_bt_sample: Advertising successfully started

And revert:

kernel reboot cold�*** Booting MCUboot v2.1.0-rc1-314-gdfe6918f73cd ***
*** Using Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: revert
I: Starting swap using move algorithm.
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Bootloader chainload address offset: 0x10000
I: Image version: v0.0.0
I: Jumping to the first image slot

uart:~$ *** Booting Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
uart:~$ [00:00:00.002,075] <inf> littlefs: LittleFS version 2.10, disk version 2.1
uart:~$ [00:00:00.002,258] <inf> littlefs: FS at flash-controller@39000:0xf8000 is 8 0x1000-byte blocks with 512 cycle
uart:~$ [00:00:00.002,258] <inf> littlefs: partition sizes: rd 16 ; pr 16 ; ca 64 ; la 32
uart:~$ uart:~$ [00:00:00.020,812] <inf> smp_sample: build 222 time: May  7 2025 07:48:48
uart:~$ [00:00:00.068,084] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
uart:~$ [00:00:00.068,115] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
uart:~$ [00:00:00.068,115] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 75.62666 Build 867997738
uart:~$ [00:00:00.069,549] <inf> bt_hci_core: HCI transport: IPC
uart:~$ [00:00:00.069,641] <inf> bt_hci_core: Identity: F6:39:AA:7E:BA:99 (random)
uart:~$ [00:00:00.069,671] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x20fb, manufacturer 0x0059
uart:~$ [00:00:00.069,702] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x20fb
uart:~$ [00:00:00.071,411] <inf> smp_bt_sample: Advertising successfully started
uart:~$ uart:~$ 

Can see the 2 different images with the 222 in the build string output

@nordicjm nordicjm removed the [DNM] Do Not Merge label May 7, 2025
@nordicjm
Copy link
Collaborator

nordicjm commented May 7, 2025

Have tried with an even closer to limit sized original image and can see no problems (swap using move):

Memory region         Used Size  Region Size  %age Used
           FLASH:      450028 B     450224 B     99.96%
             RAM:       58408 B       448 KB     12.73%
        IDT_LIST:          0 GB        32 KB      0.00%

Though @taltenbach I think the calculations here are just wrong, have tried changing slot sizes but seemingly hitting a wall, with 110 sectors in slot1 I get:

D: Non-optimal sector distribution, slot0 has 110 usable sectors (112 assigned) but slot1 has 109 assigned

and with 109 I get:

W: Cannot upgrade: not a compatible amount of sectors
D: slot0 sectors: 112, slot1 sectors: 110, usable slot0 sectors: 109

I need to use 111, which is definitely not right because there is a swap sector, then there is the trailer sector, so 110 should be correct. But anyway, with 111 sectors, I tried again and upgrade was successful:

*** Booting MCUboot v2.1.0-rc1-314-gdfe6918f73cd ***
*** Using Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0x10000
I: Image version: v0.0.0
I: Jumping to the first image slot

uart:~$ *** Booting Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
uart:~$ [00:00:00.002,075] <inf> littlefs: LittleFS version 2.10, disk version 2.1
uart:~$ [00:00:00.002,258] <inf> littlefs: FS at flash-controller@39000:0xf8000 is 8 0x1000-byte blocks with 512 cycle
uart:~$ [00:00:00.002,258] <inf> littlefs: partition sizes: rd 16 ; pr 16 ; ca 64 ; la 32
uart:~$ uart:~$ [00:00:00.020,843] <inf> smp_sample: build time: May  7 2025 08:11:18
uart:~$ [00:00:00.024,932] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
uart:~$ [00:00:00.024,932] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
uart:~$ [00:00:00.024,963] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 75.62666 Build 867997738
uart:~$ [00:00:00.026,397] <inf> bt_hci_core: HCI transport: IPC
uart:~$ [00:00:00.026,489] <inf> bt_hci_core: Identity: F6:39:AA:7E:BA:99 (random)
uart:~$ [00:00:00.026,519] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x20fb, manufacturer 0x0059
uart:~$ [00:00:00.026,550] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x20fb
uart:~$ [00:00:00.028,259] <inf> smp_bt_sample: Advertising successfully started
uart:~$ uart:~$ 

And revert was also successful:

kernel reboot cold
kernel reboot cold�*** Booting MCUboot v2.1.0-rc1-314-gdfe6918f73cd ***
*** Using Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: revert
I: Starting swap using move algorithm.
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Bootloader chainload address offset: 0x10000
I: Image version: v0.0.0
I: Jumping to the first image slot

uart:~$ *** Booting Zephyr OS build v4.1.0-3567-g3dcc1d2dd6e1 ***
uart:~$ [00:00:00.002,075] <inf> littlefs: LittleFS version 2.10, disk version 2.1
uart:~$ [00:00:00.002,258] <inf> littlefs: FS at flash-controller@39000:0xf8000 is 8 0x1000-byte blocks with 512 cycle
uart:~$ [00:00:00.002,258] <inf> littlefs: partition sizes: rd 16 ; pr 16 ; ca 64 ; la 32
uart:~$ uart:~$ [00:00:00.020,843] <inf> smp_sample: build 222 time: May  7 2025 08:10:51
uart:~$ [00:00:00.068,359] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
uart:~$ [00:00:00.068,389] <inf> bt_hci_core: HW Variant: nRF53x (0x0003)
uart:~$ [00:00:00.068,420] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 75.62666 Build 867997738
uart:~$ [00:00:00.069,854] <inf> bt_hci_core: HCI transport: IPC
uart:~$ [00:00:00.069,915] <inf> bt_hci_core: Identity: F6:39:AA:7E:BA:99 (random)
uart:~$ [00:00:00.069,946] <inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x20fb, manufacturer 0x0059
uart:~$ [00:00:00.069,976] <inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x20fb
uart:~$ [00:00:00.071,685] <inf> smp_bt_sample: Advertising successfully started
uart:~$ uart:~$ 

So cannot reproduce your problem @tomchy which is likely a downstream issue

@tomchy
Copy link

tomchy commented May 7, 2025

I've just reproduced the issue using nRF52840DK with the following setup:

&slot0_partition {
        reg = <0x0000C000 0x00076000>;
};

&slot1_partition {
        reg = <0x00082000 0x00073000>;
};

Update candidate size:

-rw-r--r-- 470792 ./build/smp_svr/zephyr/zephyr.signed.bin

If the difference in slots is just a single page, than there is still memory allocated on the secondary slot for the trailer - you simply optimize the "move"/"scratchpad" area, so the issue is not reproduced.

@taltenbach
Copy link
Contributor Author

@tomchy Unless I missed something the layout you're using is not a valid swap-move layout. From the MCUboot design document, in the swap-move section:

This does imply, if there is any doubt, that the primary slot will be exactly one sector larger than the secondary slot due to the swap sector alone. It is the case that both the primary and secondary slots both have a trailer in addition to the application payload and these trailers are identical in size to one another.

The trailer is present in both the primary and secondary slots, so you have to allocate enough space in the secondary slot for the trailer, and that space must be sector-aligned. So, assuming a 0x76000-byte primary slot and 4096-byte sectors, your secondary slot must be 0x75000-byte large. Note in practice you can use a larger secondary slot (e.g. by simplicity some layouts use identical size for the primary and secondary slot) but that is not required and is just wasting flash memory.

So, in your case, the size of the secondary slot is the limiting factor and the maximum image size is 0x73000 - 4096 = 466944 bytes. Therefore, the update image you provided is too large. This was not detected by the image size check when validating the secondary image because bootutil_max_image_size is assuming your flash layout is valid so doesn't consider the size of the secondary slot.

Perhaps we want bootutil_max_image_size to consider the size of both slots to ensure it will return the correct size even if the layout is not a valid swap-move/swap-offset layout? Do you have any opinion on that point @nordicjm @de-nordic?

Also, I'm a bit surprised boot_slots_compatible hasn't detected the layout was invalid. The configuration is:

num_sectors_pri: 118, num_sectors_sec: 115, num_usable_sectors_pri: 116

So this check is considering the layout is fine:

    if ((num_sectors_pri != num_sectors_sec) &&
            (num_sectors_pri != (num_sectors_sec + 1)) &&
            (num_usable_sectors_pri != (num_sectors_sec + 1))) {
        BOOT_LOG_WRN("Cannot upgrade: not a compatible amount of sectors");
        BOOT_LOG_DBG("slot0 sectors: %d, slot1 sectors: %d, usable slot0 sectors: %d",
                     (int)num_sectors_pri, (int)num_sectors_sec,
                     (int)(num_usable_sectors_pri - 1));
        return 0;
    } else if (num_sectors_pri > BOOT_MAX_IMG_SECTORS) {
        BOOT_LOG_WRN("Cannot upgrade: more sectors than allowed");
        return 0;
    }

@nordicjm Isn't num_usable_sectors_pri != (num_sectors_sec + 1) wrong? I would have expected (num_usable_sectors_pri + 1) != num_sectors_sec.

@taltenbach
Copy link
Contributor Author

taltenbach commented May 8, 2025

@nordicjm Regarding:

I need to use 111, which is definitely not right because there is a swap sector, then there is the trailer sector, so 110 should be correct.

If your primary slot has 112 sectors and the trailer fits in a single sector, then I don't understand why having a secondary slot of 110 sectors should be correct since you need the trailer sector also in the secondary slot:

  • Primary slot: Firmware image (110 sectors) + Swap sector (1 sector) + Trailer (1 sector) = 112 sectors
  • Secondary slot: Firmware image (110 sectors) + Trailer (1 sector) = 111 sectors

The trailer of the secondary slot is not used for storing swap status during the upgrade/revert but is still needed for marking the upgrade as pending.

@nordicjm
Copy link
Collaborator

nordicjm commented May 9, 2025

@nordicjm Regarding:

I need to use 111, which is definitely not right because there is a swap sector, then there is the trailer sector, so 110 should be correct.

If your primary slot has 112 sectors and the trailer fits in a single sector, then I don't understand why having a secondary slot of 110 sectors should be correct since you need the trailer sector also in the secondary slot:

* Primary slot: Firmware image (110 sectors) + Swap sector (1 sector) + Trailer (1 sector) = 112 sectors

* Secondary slot: Firmware image (110 sectors) + Trailer (1 sector) = 111 sectors

The trailer of the secondary slot is not used for storing swap status during the upgrade/revert but is still needed for marking the upgrade as pending.

It should be putting that after the image trailer (not the update progress trailer) without a gap (unlike in primary, where there is a gap), no? Though actually I guess you are right either way because if the image reverts then you will be placing that status in the primary slot which now contains the secondary slot image, so you need to include that in the size of the secondary slot so yes, primary slot should always be the same size or exactly 1 sector more for swap using move mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants